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Abstract 

One of the main necessities for population geneticists is the availabihty of 
statistical tools that enable to accept or reject the neutral Wright-Fisher 
model with high power. A number of statistical tests have been developed 
to detect specific deviations from the null frequency spectrum in different 
directions (i.e., Tajima's D, Fu and Li's F and D test, Fay and Wu's H). 
Recently, a general framework was proposed to generate all neutrality tests 
that are linear functions of the frequency spectrum. In this framework, a 
family of optimal tests was developed to have almost maximum power against 
a specific alternative evolutionary scenario. Following these developments, in 
this paper we provide a thorough discussion of linear and nonlinear neutrality 
tests. First, we present the general framework for linear tests and emphasize 
the importance of the property of scalability with the sample size (that is, the 
results of the tests should not depend on the sample size), which, if missing, 
can guide to errors in data interpretation. The motivation and structure of 
linear optimal tests are discussed. In a further generalization, we develop 
a general framework for nonlinear neutrality tests and we derive nonlinear 
optimal tests for polynomials of any degree in the frequency spectrum. 

Keywords: Coalescent theory. Site frequency spectrum. Population 
genetics. Statistical power. Summary statistics 
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1. Introduction 

Statistical tests for neutrality are important and useful tools for popula- 
tion genetics. Since the development of molecular genetics techniques allowed 
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to obtain nucleotide sequences for the study of populations genetics pQ, a 
number of neutrality tests have been developed with the objective to facili- 
tate the interpretation of an increasing volume of molecular data. Statistical 
tests for neutrality have been generated exploiting the different properties of 
the stationary neutral model. Examples of tests are the HKA [2|, which takes 
advantage of the polymorphism/ divergence relationship across independent 
loci in a multilocus framework, and the Lewontin-Krakauer test p], which 
looks for an unexpected level of population differentiation in a locus in re- 
lation to other loci. Also, there is another family of tests related to linkage 
disequilibrium, as the one developed by which detect long haplotypes at 
unusual elevated frequencies in candidate regions. 

An important family of these tests, often used as summary statistics, 
is built on the frequency spectrum of allele polymorphisms. This family 
includes the well known tests by Tajima |5j, Fu and Li [Gj and Fay and 
Wu [7]. If an outgroup is available, these tests are based on the unfolded 
spectrum ^j, that is, the number of segregating sites with a derived allele 
frequency of i in a sample of (haploid) size n. Without an outgroup, it is 
not possible to distinguish derived and ancestral alleles and the only available 
data correspond to the folded spectrum 77^, that is, the number of segregating 
sites with a minor allele frequency of i. The quantities and rji are all positive 
and the range of allele frequencies is 1 < i < n — 1 for the unfolded spectrum, 
1 < i < [n/2j for the folded spectrum. The average spectra for the standard 
Wright-Fisher neutral model are given by 

1 Tl 

Ei(,)^fL , ^M = ,(„_,)(,^,_/ ^. (1) 

where L is the length of the sequence and 6 = 2pfiNe, where /i is the mutation 
rate, p is the ploidy and A^^e is the effective population siz^ Note that the 
spectra are proportional to 9. 

In a recent paper by Achaz [S], a general framework for these tests was 
presented. The general tests proposed there were of the form 

^Var ^Var (^.tf J^^v,) 



^Note that we define 9 as the rescaled mutation rate per base and not per sequence. 
Apart from this, we follow the notation of [8^ and [9]. 
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that are centered (i.e., they have a null expectation value) if the weights 

^i,^* satisfy the conditions Yl^=i — ^\=i^ — 0- This frame- 
work allows the construction of many new neutrality tests and has been used 
to obtain optimal tests for specific alternative scenarios pO]. However the 
original framework does not take into account the dependence of the tests on 
the sample size, as emphasized in pTO]. It is important to choose this depen- 
dence in order to obtain results that are as independent as possible on sample 
size. Moreover, the framework presented in |H] covers just a large subfamily 
of neutrality tests based on the frequency spectrum, that is, the class of tests 
that are linear functions of the spectrum. This subfamily contains almost all 
the tests that can be found in the literature with the exception of the G^, Gn 
tests of Fu [11], which are second order polynomials in the spectrum whose 
form is related with Hotelling statistics. Since these G^, tests were shown 
to be quite effective in some scenarios, it would be interesting to build a 
general framework for these quadratic (and more generally nonlinear) tests. 

In this paper we provide a detailed study of the properties of the whole 
family of tests based on allele frequency spectrum, beginning with the dis- 
cussion of the most interesting case, i.e., linear tests. We present a thorough 
analysis of a simple proposal for the scaling of the tests with the sample size, 
then we analyze the geometrical properties of the optimal tests presented 
in [To] and we propose generalizations of D' test to general linear tests and 
linear optimal tests. Finally, we go beyond the framework presented in [8] 
and discuss the most general class of tests, that is, polynomials of any order 
in ^i,r]i, and obtain the optimal tests for polynomials of any order. These 
results allow to build new and more effective tests. The proofs can be found 
in Appendix [B] 

2. Linear neutrality tests 

2.1. General framework 

As discussed by Achaz [H|, the general form for linear tests based on the 
unfolded spectrum can be written as 



(3) 
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where f2j is a set of weights satisfying the condition 



n— 1 



J2^. = 0. (4) 



This is the most general form if we require that the test is centered and 
with variance 1, that is, E{Tq) = and Var(Tn) = 1. The condition of 
centeredness can be obtained substituting the spectrum with its average in 
the standard neutral model, given by the equations ([T]). 

Alternatively, it is sufficient to choose any pair of unbiased estimators of 
6 based on the unfolded spectrum 

^ n— 1 ^ n— 1 

i=l 1=1 

with weights Ui,Ui that obey the conditions 

n— 1 n— 1 

i=l i=l 

to obtain a new test for neutrality: 



Var(^. - M yVar(E;r^j(^.-^)e.) ^V^E^^J^ 

(7) 

that is equivalent to the definition (|3| with f2j = — w^'. Therefore a test 
Tq is defined by real vectors Q or satisfying the above normalization 
conditions. 

If an outgroup is not available, then the test should be based on the folded 
spectrum and has the general form: 



J:tl^^in-^)il + 6n,2^mV^ 



'Var j{n - + Sn,2,)^*Vj) 



(8) 
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where the weights fl* = u* — u*' satisfy the conditions 



Ln/2J L"/2J [n/2i 

1=1 1=1 1=1 



and 



^. = ^L ^^^^ ' ^"' = ZZ. ^ ^^^^ 

i=l i=l 

(10) 

are unbiased estimators of 9. 



2.2. Transformations of weights and invariance of tests 

We report some theorems on the invariance of the tests under affine 
transformations. These results can be easily proved and are implicitly used 
throughout this paper. 

Theorem 1. A test of the form does not change its value if all the 
weights Qi are rescaled by a common factor A > 0, that is, 

^ Xn, Tn-^ sign(A)rf, (11) 

Note that the invariance of the tests mean that these transformations de- 
fine equivalence classes of weights, i.e., sets of different weights that actually 
correspond to the same test. In particular, this theorem implies that the 
space of possible tests, in terms of the weights is not homeomorphic to 
]R"~2 (^^hich would be the subspace of weights in ]R"~^ that satisfy the linear 
condition (|4])) but to its quotient with respect to the invariance (multiplica- 
tive) group ]R+, that is, the [n — 3)-dimensional sphere S^~^ = ]R'^~^/]R"'". 

Theorem 2. A test of the form ^ does not change its value under an affine 
transformation of parameters {\, Pi) on the weights Ui, uj[ with a common 
rescaling factor A > 0, that is, 

Ui — > Xui + Pi , Ji — > Xbj'i + Pi Tq — > sign(A)rf7 (12) 

However, the estimators are unbiased only if the rescaling factor satisfies 
the condition A = 1 — Yl'i=i Pi- 
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2.3. Generalized D' tests for multilocus analysis 

The statistic D' [12j, which is defined as the ratio of Tajima's D versus 
its minimum value (given a fixed number of segregating sites), has been used 
in the hterature for multilocus analyses [121 [131 El] , arguing that the value 
of Tajima's D is affected by the length, the sample size and the number of 
segregating sites of each studied locus and therefore the values of each locus 
are not directly comparable. 

The contribution of each locus to the heterogeneity is hardly known. 
Tajima's D is robust to differences in the level of variability (the variance 
is approximately equal to one) and also quite robust against differences in 
sample size (as is will be shown in the next part), although the quantitative 
values of Tajima's D for each condition are someway different and therefore 
the comparison between values is not simple. The proposal of Schaeffer is to 
use the test 

^' = —77^ (13) 

as a (re) normalized version of Tajima's D. Sobs is the observed number of 
segregating sites in the sample. This proposal can be generalized for all the 
tests of the form (|3| as follows: 

^ mm{Tn)s=s,,, mm{jQj)Sobs 
This appears to be the natural generalization of D' to general linear tests. 



3. Sample size independent tests 

3.1. Scaling of weights with sample size 

In this section we would like to remark that there are conditions that 
have to be imposed on the weights Qi or Ui, uj[ to ensure that these tests are 
consistent and meaningful. In fact, the values (and even the number!) of 
these weights depend explicitly on sample size n. Since every conceivable 
test should be applied to samples of different size, then its definition involves 
a whole family of weights or jco'^-"'', c<;^*-"^| with n = 2, 3 . . . oo and to 

define a test it is necessary to specify how these weights scale with n. 

As an example of the weird effects of some choices of scaling, we consider 
the test for admixture of [S]. The weights of this test are Ui = (")2""(1 — 
2-n+i-j-i g^j^^ ^/ _ 1/(^72 — 1). Suppose that the population under study shows 
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-10»^ 1 1 , ^ r- 

D D4 D.6 0^ I 

Figure 1: Illustrative example of the dependence of the weight on sample size: weight 
as a function oi i/n for the test for admixture by Achaz, plotted for different sample size 
n = 10 (blue), 100 (red), 1000 (yellow). 



an excess of alleles of frequency / between 0.3 and 0.4. The average weight 
of these frequencies, rescaled by the sample size, is 0.5 for n = 10, but it 
reduces to -0.75 for n = 100 and to -1.0 for n = 1000. These weights are 
largely different, even in sign, therefore a strong excess of alleles in this range 
of frequency would show itself as either a positive or a negative value for this 
test, depending on the sample size! The reason can be understood by noticing 
that for n large, the binomial can be approximated by a Gaussian function 
of the allele frequencies f = i/n centered in / = 1/2 and with variance l/4n. 
Therefore this weight function has a strong dependence on n when considered 
as a function of / and n. The changes of this weight function with sample 
size are apparent in the plot of Figure [T| which shows the actual function 
(rescaled by sample size) for n = 10, 100, 1000. In this example it is apparent 
that the interpretation of the results of this test depends on n. This means 
that the calibration of the test should be different for each possible sample 
size. 

The consistency requirement that we propose is that the result of the test 
should be almost independent on sample size. This requirement is equivalent 
to a condition on the scaling of the weights fi-"^ with n. Our proposal for a 
reasonable requirement on this scaling is the following: the relative weight of 
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different frequencies in the population should remain approximately constant 
while varying the size of the sample. This condition ensures that at least for 
sufficiently large n, the average values of the test on samples of different size 
from the same population should be approximately independent on sample 
size, i.e. that the test should be consistent. 

To determine the scaling, we note that in limit of large ra, the frequency 
spectrum approaches a continuum and we can define the weights as func- 
tions nif) or u{f),u'if) with / G (0,1) and rf/^](/) = 0, J.'dfuif) = 

df uj'{f) = 1. Since the ratio of the derived allele count and the sample size 
i/n is an unbiased estimator of the frequency / of the allele in the population 
(because E{i) = nf), a simple scaling that satisfies the above requirement is 

fi-"-* ~ ^(2/72) or w-"^ ^uj{i/n) , w ~ uj'{i/n) (15) 

as proposed by some of the authors in [;10j. 

In order to have the above approximate scaling while obeying the condi- 
tion Y17=i ~ 0' there are two simple consistent forms for the weights: 



where the last term is a (tipically small) correction that enforce centeredness 
of the test, or 

where the denominators are normalization factors. 



Tipically this second form (17) for the scaling is more consistent in prac- 
tice and it is implicitly assumed for most of the existing tests. However, 
the above espressions give similar numerical results for most choices of the 
functions ^{f) = w(/) — uj'{f). In fact, if ^{f) is a limited and piecewise- 



continuous function, the difference between (16) and (17) is of order 0{Q)/n 
(since it is a factor coming from the discretization of the frequencies) and it 
does not have a relevant impact on the results of the test. Therefore in these 



cases the two scaling relations (16) and (17) are practically equivalent. 

Note that all the tests involving the Watterson estimator (that corre- 
sponds to w(/) ~ 1//) have additional subtleties that are discussed in the 
next section. 
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Example: Fay and Wu's H test 

This test was proposed in [7] to look for an excess of high-frequency de- 
rived alleles as a signal of selection. It can be defined by the weight functions 
uj{f) = 2/ and uj'{f) = 1. The weights can be found following equation (17). 
The resulting test is 

J-H = — , = 



(n(n-l) Sj=l ^^^i n-1 ^i=l -^^J 



The scaling defined in equation (16), with weight function = oj{f) — 

oj'{f) = 2/ — 1, gives precisely the same result. 

Example: F{r, r') tests of Fu ITB^ 

This large class of test is based on the comparison of two estimators with 
weights 



that in the case r,r' < 1 correspond precisely to the scaling (17) suggested 
above, with weight functions a; (/) = (1 — r)/"'" anda;'(/) = (1 — r')/~^' . This 
can be easily verified by multiplying both the numerator and the denominator 
of Ui, u:[ by a factor (1 — r)/n~'^ , (1 — r')/n~'^ respectively. The test by Fay 
and Wu corresponds actually to F(— 1, 0). 

The cases with r > 1 or r' > 1 involve weight functions with divergent 
integrals and will be discussed in the next section. 

Note that the same weight functions with the scaling (16) would give rise 
to a slightly different test with weights 

(20) 

that is not consistent for weights of low frequency alleles, i.e. with i/n < 
^2/max{r,r')^ and therefore less interesting. 

Example: test for bottleneck of Achaz 

This test is another example of a test with an unwanted scaling: 

UJi = =^7^3! : , uj'i = 7 (21) 
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The weight function for this test is e~""-^an/(l — e~"") — 1 that depends 
strongly on n, therefore this test is not consistent in the above sense. 

It is easy to build an equivalent test with the correct scaling by choosing 
the functions uj{f) = (3e~^^ / (1 — e~^), uj'{f) = 1. The resulting weights with 



the scaling (17) are 

-f5i/n 1 _ p-P/n 



-K^n-l i3j/n 1 _ g-/3(l-l/n) 
Z-/7 = l 



uj- = - = ^ - rj' = 1: (22) 



n 



as discussed before. The optimal value reported in [8] is a ~ 0.9 for n = 30. 
This value corresponds to /3 ~ 27. 

The test can also be implemented by choosing the scaling (16) and the 
weight function = oj{f) — <^'{f) = /3e~^^ /{I — e~^) — 1. The resulting 
weights are 

1 ( /3(1 - e-/^(^-V")) , 
;3(l-e-/^(i-V")) / i-e-^/- 1 



(1 - e-/3)e/3/"(l - e-/5/") V 1 - e-/3(i-V«) n 



that are equivalent to the weights (22) up to an irrelevant multiplicative 



factor (see Theorem [T]). Therefore in this case the two choices of scaling give 
precisely the same result. 

3.2. Divergent weights 



As discussed above, the two choices of scaling in equation ( 16 ) and ( 17 ) do 
not usually bring to sensibly different numerical results. However, there are 
important choices of ^{f) for which this approximate equivalence between 



(16) and (17) does not hold. These critical cases correspond to functions 
that diverge as 1// or faster near / = (or / = 1). This divergence is not 
a real feature of the distribution, because the integral has a natural cutoff 
at the scale of the inverse population siz^ fmin = l/-^, but in this case 
the integral ^^^^ df VL{f) has a strong dependence on the cutoff 1/A^ and 
therefore the function itself should depend strongly on N to ensure 

proper normalization. 



^Or more precisely the effective population size l/N^, but this does not affect the above 
discussion. 
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If this dependence is contained in an multiplicative term in front of uj{f) 



or uj'{f) or both, then the second term in equation (16) is not a small cor- 
rection of order 1/n as it happens with simple functions ^{f), but rather 
it represents a relevant correction with a strong dependence on sample size 



n and population size N. The denominators in equation (17) also show a 
strong dependence on n (that could not be avoided anyway) but not on N, 
and therefore this second scaling form should be used. The dependence on 
sample size is as strong as the dependence of the divergent integral from the 
cutoff for functions diverging as with A; > 1, the dependence on n goes 
as n^~^ if /c > 1 or log(n) for k = 1. This case always occurs when the test is 
build by comparing an estimator of 6 with the Watterson estimator, which 
corresponds to uj{f) ~ 1// and therefore has a logarithmic dependence on n 
given by the usual harmonic factor a„ = X]j=i 1/j — log(^) + 7 + 0(l/n). 
A well-known examples of this case is Tajima's D 0. 

If the dependence of on N is contained in an additive term that 



does not depend on /, it is the correction in (16) that does not depend on 
N and therefore the first scaling form is more appropriate. We do not know 
examples of tests of this kind in the literature, even if the test by Zeng et al. 
[16l| can be interpreted also in this way. 



Example: Tajima's D test 

This is the most known test for neutrality based on the frequency spec- 
trum. It is given by the difference between the Tajima estimator 11 [T7] based 
on the nucleotide pairwise diversity 11 and the Watterson estimator 9w [E] 
based on the number S of segregating sites, therefore it can be defined by 
the weight functions oo{f) = 2(1 — /) for 11 and w'(/) = l//log(A^) for the 
Watterson estimator. The latter function has an integral that diverges loga- 
rithmically near / = 0, and the corresponding dependence on is contained 



in the factor l/log(A^) that multiplies oo'{f), therefore the scaling (17) should 



•^This can be easily understood by noticing that the sample size n plays the role of 
the cutoff in the sum over the frequencies that are present in the sample, which is the 
same role played by the population size N for the whole population. More formally, the 
denominator in equation (17) can be bounded from above and from below by the divergent 
integral, and therefore the divergence of the denominator as n — > oo will be the same as 
the divergence of the integral as its inverse cutoff (that is, N) goes to infinity. 
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be used. The result is the usual test 

Example: test of Zeng et al. [TUf 

This test was proposed to look for an excess of high-frequency derived 
alleles compared to low-frequency alleles. It is defined by the weight functions 
uj{f) = 1 and uj'{f) = 1/ f\og{N), the latter corresponding to the Watterson 
estimator. Proceeding as in the above example, the result is 

= ^ (25) 



Note that exceptionally the scaling of this test can also be defined by (16), 
without modifying the result. This is a consequence of the two equivalent 
forms for the weight function, = 1 — 1// log(A^) or fl{f) = log(A^) — 1//. 

3. 3. Weights of singletons 



The above scaling (15) is valid in principle for all weights. However in 
practice there is an important exception, that is, the weight Qi of singletons. 
This is due to the fact that for n N, the number of derived singletons ^1 is 
the only estimator that is affected by very rare derived alleles (and often by 
sequencing errors, see [19]). More precisely, ^1 is actually the only estimator 
sensitive to the deviations from neutrality in alleles of frequency 1/A^ < / < 
1/n, which represent a vast majority of the SNPs in the population and can 
contain interesting biological information. Therefore, if the contribution of 
these alleles is relevant for the test, we can enhance (or reduce) the weight 
fli hj adding a factor Qds- 

In the approach detailed in the previous sections, this additional contribu- 
tion to fli is needed to take into account a contribution AQ{f) to ^{f) of the 
form AQ{f) = Qdsl{f < 4>)/4' with <^ 1. As far as the maximum sample 
size never exceeds in practice Umax <^ 2/0, this function weights positively 
only alleles that appear as singletons. 

Similarly, ui and u[ can be enhanced by Uds, uj'^s ^^^t correspond to 
contributions Aa;(/) = ujdsl{f < (/>)/(/>, A.uj\f) = <^dsHf < 4>)/4>- The test 
of Fu and Li [6] fall into this case. 
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A similar argument applies also to the weights of the number of ancestral 
singletons, that is, fln-i, i^n-i, ^n~i ^^at can be enhanced by factors Qas, 
cOas and respectively. However this case is more rare, the only interest- 
ing example being the tests of Achaz |19] that avoid sequencing errors by 
neglecting both derived and ancestral singletons. 

Summarizing the results up to this section, a test Tq is completely defined 
by a function and two parameters Qds, ^as (that could depend on n) 
satisfying the conditions 

^ds + ^as+ ['dfQ{f) = (26) 
and determining the weights through the formula: 

nf"^ = n (^^^ +ndA,i+nasSi,n-i-^^ (^n^s + n,, + (^^^j (27) 

or by a pair of functions u{f),u;'{f) and parameters 0;^^, <^dsy ^as, ^'as satis- 
fying ^ ^ 

^ds + O0as+ [ df u{f) = u',, + + / df Lo'if) = 1 (28) 
Jo Jo 

and resulting in this formula for the scaling of the weights: 

^(n) ^ ^dskl+^as5i,n-l+Uj{^) _ C^dAl + ^a/»,n-l + [j] ^^^-^ 
Uds + UJas + Ej^l^ ^ ( ^) ^ds+<s + Ej^l^ ^'(i) 

As showed in the examples above and below, most of the tests in the 
literature have this general scaling, with the only exceptions of the ones 
contained in [8] that are not consistent in the above sense. 

Example: Fu and Li's F test 

This test looks for an excess of very rare derived alleles as a possible 
signature of negative selection [6]. The only nonzero weights are Uds = 1 and 
uj'{f) = 1/ f\og{N), while uj{f) = u'^^ = ujas = ^'as = 0- The resulting test is 

Tf = ^ ^1-'^/^" (30) 
A/Var (^1 - S/an) 

Note that this test has both singleton weights and a divergent weight func- 
tion. 
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Example: error- corrected tests of Achaz fWf 

This class of tests is an attempt to correct for sequencing errors and bi- 
ases in the data by removing the alleles where most of the problems manifest 
themselves, i.e. singletons (both ancestral and derived). With a slight gen- 
eralization of the proposal in [12], the weights of the singletons are chosen in 
such a way to cancel precisely the contributions of the weight functions: 

Qds = (l) , ^as = ~n(l~ ^] (31) 

or 



n / \ n 



u:,s = (^j ,u:^s = (l - Ij = -J (^j = -c.' (l - ^ 

(32) 

therefore the final weights of derived or ancestral singletons are zero. These 
corrections can be applied in principle to any weight function. 

3.4- Scaling of weights in tests without an outgroup 

The above arguments can be repeated in a straightforward way for the 
tests based on the folded spectrum r]i. The only relevant difference is 
that the frequency / of the minor allele in the population is always less than 
50%, that is, / G (0, 1/2]. For consistency with the unfolded case, the weight 
r]n/2 is reduced by a factor 2. Moreover, the additional parameters related to 
the weights of singletons cannot distinguish between ancestral and derived 
alleles and therefore reduce to VL*, u*, u*'. These parameter, together with 
the functions ^*{f), w*(/) and uj*'{f), should satisfy the conditions 



n:+ [ dfn*{f) = o , uj:+ [ dfio*{f) = uj:'+ [ dfio*\f) = i 

Jo Jo Jo 

(33) 



The formulae that determine the scaling of the weights are: 



= , ; n* [-]+ n:6,,i - 



1 / A 1 




(n) 



l + Sn,2^ \nj ' [n/2\ 

0O:S,,i + 00* (^) /(I + 6n,2^) UJ^' 6,^i + U*' (^) /(I + 6n,2^) 



(34) 



+ E T ^* /(I + < + E (^) /(I + 5. 



(35) 
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The weights of the folded versions of Tajima's D and Fu and Li's F* test 
follow this scaling. The nonzero weight functions are = 1, u}*'{f) = 

l/(log(iV)/(l - /)) for Tajima's D and u*, = 1, u*'{f) = l/(log(iV)/(l - /)) 
for the test of Fu and Li. 

3.5. Alternative choices of scaling 

The choice of scaling discussed in the previous sections represents a quite 
simple and effective way to fix the dependence on n of a newly devised test. 
However, other choices are possible whose weights differ from the above ones 
for small n. The reason is that for n not too large, both the variance of order 
/(I — f)/n ~ i{n — i)/n^ in the estimation of the frequency / = i/n and 
the related uncertaincy about how the frequencies are actually weighted in 
the test become important. This uncertaincy originates from the (binomial) 
sampling of individuals from the population and there is some degree of 
arbitrariness in deciding how to account for it. Moreover, tests that take it 
into account could be not consistent in the above sense. 

A possible choice of scaling that uses the binomial sampling is the fol- 
lowing: considering co{f), co'{f) as frequency distributions, the weights w,, 
oj'^ are assigned from u{f), ou'{f) through the same binomial sampling that 
is done for allele spectra, that is, 

..^ /o^/a)f(l-/)--(/) (36) 
/o^/(l-/"-(l-/)")c^(/) 

^, ^ /o^/a)f(i-/r-^^x/) .37. 

A simple example of this scaling (but with an highly divergent weight 
function) is given by the test for admixture discussed before. Optimal 
tests also follow this scaling. 

Example: test for admixture of Achaz 

This test is apparently not consistent and it does not follow the scaling 



(15). However it follows another scaling related to the allele sampling. To 
understand this, consider the weight functions uj{f) = 6{f — 1/2), uj'{f) = 1 
where 6{f — 1/2) is a Dirac delta functior|^ centered in 1/2. If we scale the 



''The Dirac delta S{f — a) is a function whose value is if / 7^ a and +00 if / = a. The 
integral / 6{f — a)g{f)df is g{a) if a is inside the range of integration and otherwise. 
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weights according to (p6)),(37), that is, 



/oCy(^)f(l-/)— u;(/) 



a)2 



— n 



(38) 



/orf/(i-/"-(i-m^(/) 1 



2-n+l 



/orf/(:)f(i-/)"-^c^'(/) 



1 



(39) 



/„^rf/(l-/--(l-/)")a;'(/) 



n — 1 



then the corresponding test is precisely the one proposed by Achaz. Note 
that the strong dependence of the test from sample size does not come only 
from the choice of scaling, but also from the weight function chosen, that is 
highly divergent. 

4. Linear optimal tests 

4-1. On the existence of generic tests 

An interesting question on the way to build good linear tests is the fol- 
lowing: do there exist generic tests? A completely generic test for neutrality 
should be able to detect any deviation from the spectrum of the null model 
that is sufficiently large. Unfortunately, these tests do not exist. In fact, for 
every test defined by a set of weights Vti it is possible to find a spectrum 
= a/ittn + (1 — a)Aj that is maximally different from the standard spec- 
trum at least in a range of frequencies and is nevertheless undetectable by 
the test because its average value on this spectrum is zero. This is expressed 
in a more formal way in the following theorem, which shows that even the 
complete lack of alleles in some range of frequencies could not be always 
detected. 

Theorem 3. For every set of n real weights VLi with ^ . Vti = 0, there is a 
set of n real numbers Aj 7^ const/i and a parameter a G [0,1] that satisfy 
the conditions 



Actually this function is not a mathematical function, but a distribution, i.e. an element 
of a dual space of regular functions. 





(40) 
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The above limitation is not a consequence of the small sample size. This 
can be seen for example in the framework of the scaling theory discussed in 
this paper. In fact, for large sample size, the weights can be approximated 
by a weight function In this context it is possible to prove the next 

theorem, that is a continuous equivalent of the previous one. 

Theorem 4. For every piece- wise continuous weight function ^l{f) G L^-^^j^^ 
such that Jyj^^il{f)df = 0, there is a smooth function A(/) ^ const/ f and 
a parameter a G [0, 1] that satisfy the conditions 

X'„*W)A(/)=0 . W_^(a.^ + (l-a)A(/))=0 (41) 

Note that in principle this problem can be solved using multiple tests. 
In fact multiple tests should be able to detect any strong deviation from the 
null spectrum, provided that the number of these tests is large enough, as 
can be seen from the following theorem. 

Theorem 5. Given at least n — 2 linearly independent sets of n — 1 real 
weights Qi with ^ • Qi = 0, it is not possible to find a set of real numbers 
Aj 7^ const /i such that "^-iQiAi = 0. 

This last theorem is only a formal result and the requirement of n — 2 
independent tests is too strong. In practice a small (but good) set of tests can 
detect most of the reasonable and interesting deviations for realistic spectra. 

The above theorems can be extended to the folded spectrum. In this 
section and the next ones, we will consider only tests based on the unfolded 
spectrum. The generalization of the discussion to the folded spectrum is 
usually straightforward after substituting [i = 1 . . .n — 1) with r/j [i = 
1...K2J). 

4-2. Optimal tests and their geometric structure 

From the theorems of the previous section, it is clear that a single test 
cannot detect all the possible deviations occurring in complicated evolution- 
ary scenarios. However it is still possible to optimize neutrality tests of for 
a specific alternative evolutionary scenario. A simple optimality condition 
has been proposed by some of the authors in [10] in order to maximize the 
power of the test to detect a fixed alternative scenario. If the null spec- 
trum is -E'(^t) = OL^^ and the expected spectrum of the alternative scenario 



18 



is S{^i) = OL^i, the condition for optimal tests is the maximization of the 
average resuh of the test under the ahernative scenario: 



Var (E;r!^-e./e° 



This condition is based on the observation that the tests have mean zero and 
variance 1, therefore if the distributions of the resuhs of the tests are similar, 
the maximization of the average value of the test should correspond to the 
maximization of the average power of the test. It is also possible to maximize 
directly the power of the test, taking into account the different distribution 
of the results under the null and the alternative model; this possibility will 
be pursued in section |4.5 



Interestingly, optimal tests show a geometric structure which becomes 
apparent after defining the scalar product between spectra: 

= (43) 

where c^-^ is the inverse of the covariance matrix Cov(^j, ^j). Since Cov(^j, ^j) 
is symmetric and positive, its inverse is also symmetric and positive, i.e. it 
is a positive bilinear form, therefore the above expression defines a scalar 
product. Then the optimal test for an alternative spectrum ^ can be written 
in the elegant forr 



Iq = , _ _ _ ^ 144J 



^((e,e»-((e°,e»V((e°,e°)) 



The numerator of the test is actually the matrix element between ^ and ^ 
of the linear operator 1 — P^o, where P^o is the projection operator along 
In other words, it is proportional to the difference between the length of the 
projection of on ^ and the length of the projection on of the spectrum 
obtained by the projection of ^ on as illustrated in Figure [2] 



^We do not provide a proof of this expression here because it can be easily obtained 



as a special case of the general formula (67) that we will discuss later in the context of 



nonlinear tests. A direct proof of this result can be found in [10] after substituting the 



scalar products with the definition ( 43 ) 
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Figure 2: Geometrical representation of the numerator of the optimal test Tq in (44). The 
length of the red line segment corresponds to the value of the numerator. 



From this geometrical interpretation it is clear that if the spectrum ^ corre- 
sponds to the null spectrum then the two projections are equal and the 
result of the test is zero. On the other side, if the spectrum is the alternative 
spectrum ^, then the value of the test is 

^r^^ = ^^y'((e,e>)- jfrfl (45) 

which is the maximum value over all possible tests in the alternative scenario. 
The same expression, but with a minus sign, corresponds to the minimum 
value. 

The denominator of the test is the square root of the matrix element of the 
linear operator 1 — P^o between ^ and itself. Note that both the numerator 
and the denominator of the test do not change by adding any (possibly 
negative) multiple of ^° to ^, because lies in the kernel of 1 — P^o. This 
means that optimal tests depend only on the expected deviations from the 
null spectrum in the alternative scenario. The result of the test is maximum 
when the deviations of the data from the null spectrum correspond exactly to 
the expected ones, and it is minimum when they are opposite to the expected 
ones. 

4-3. Scaling of optimal tests 

Optimal tests have weights proportional to the expected allele distribu- 
tion Ui = ^i/ YTjZiij ^'^^ to the null allele distribution u'^ = C°/X]j=i ^j- 
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Therefore the weights of an optimal test follow the same scaling with sample 
size as the allele distributions. Denoting by ^(/) and ^°(/) the spectra of 
expected and null allele frequencies in the whole population, the spectra for 
the sample are obtained by binomial sampling 



1 'n 



^^= / dfPU^njm) PUr,nJ)=( , (46) 



from the spectra ^(/) = ^(/) and ^(/) = ^°(/) respectively. Therefore the 
scaling of the allele distributions is 



6 iUdfii)fi^-fr"m 



TT'l jl/^^ df (!-/"-(!- m af) 



lU df (1 - /" - (1 - m ^°(/) 



(47) 



(48) 



which is precisely the scaling (36), (37) with weight functions oo{f) oc ^(/) 



and co'if) oc ^°(/). This scaling does not correspond to the scaling (15) 
suggested in this paper, but it takes into account the sampling process in a 
straightforward way, being based on the expected and null allele distributions 
for the sample and therefore immediately related to the binomial sampling 
of alleles from the population. 



Note that these weights actually follow the scaling (15) for large n, with 
the same weight functions w(/) cx: ^{f) and uj'{f) oc ^°(/). This agrees 
with the fact that when the sample size is large enough, the variance of the 
sampling process can be safely ignored and all reasonable choices of scaling 



are equivalent to our proposal (15). 



4-4- Generalizing D' for optimal test 

The D' statistics of [12j can be generalized for optimal tests as it was done 
for general linear tests. In particular, for a fixed number of segregating sites 
Sobs, the generalization for optimal tests in the approximation of unlinked 
sites and 6* ^ 1 is 



t:=i ^s/^? Er=7te/^o..)(ev Q/iiV e-zI e°) - 1 

(49) 



^« " min,(fi,/^0)5„,. min, ((a/ Er="i' a)/(47 E^i ^f)) " 1 
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which has the interesting property of depending only on the allele frequency 
distributions. 

However in the case of optimal tests there is another possibility, namely 
to define a D' test as the ratio of the optimal test and of its average mini- 
mum, assuming that the spectrum corresponds to the average spectrum of 
the actual scenario. This would correspond to the form 

Souian^{{{U)) - ((e°,e»v((e°,e°))) m.i)) - {{e.i)?i{{e,m 

(50) 

which has the interesting property of being symmetric with respect to the 
actual spectrum ^ and the expected spectrum ^. 

4.5. Linear tests with maximum power 

The condition for optimal tests is the maximization of S{Tq) under the 
alternative scenario. However, a better approach would by the maximization 
of the power of the test to reject the neutral model in the alternative scenario, 
given a choice of significance level a. This approach require the knowledge of 
the form of the probability distributions p(Tq = t\HQ), p{Tfi = t\Hi) where 
Ho and Hi are the null and alternative model, or equivalently of all the 
moments of the spectrum E{^i^j$,k • • •) and ^(^i^j^fe • • •)• 

Since this information is usually not available in analytic form and hard 
to obtain computationally, we limit to the case where the distribution can 
be well approximated by a Gaussian both for the null and for the expected 
model. Then the only information neeeded are the spectra /Xj = E{E,i), fit = 
and their covariance matrices = E{^i^j)—E{^i)E{C^j), Cij = — 

mm,)- 

We expect that both in this approximation and in the general case, the 
tests with maximum power will depend on the significance level chosen, there- 
fore limiting the interest of these test and the possibilities of comparison 
between results of the test on samples from different experiments. 

We call r = erf~^(l — 2a) the 2;- value corresponding to the critical p- 
value a. In the Gaussian approximation, the power is given by the following 
expression 



Power = - 
2 



1 + erf 



(51) 
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then its maximization is equivalent to the maximization of 



^ (52) 

In the general case, the weights corresponding to the maximum depend ex- 
plicitly on r and therefore on a. This dependence is expected but unwanted, 
since the interpretation of the test depends explicitly on the critical p-value 
chosen. Obtaining explicit solutions for the weights is complicated and will 
not be discussed here. 

There is only one case with weights independent on r, that is the case of 
Cij (approximately) proportional to c^-. In this case the maximization of the 
power of the test is (approximately) equivalent to the maximization of the 
average result of the test, which is precisely our condition for optimal tests. 

There is also a regime of values of a such that the weights corresponding to 
maximum power are independent on a, that is, the regime r(a) ^ 1. In this 
case the power is an increasing function of ^ Cjkfljflk/J2i m c«m^i^m and 
the weights are simply given by the null eigenvector (or linear combination 
of null eigenvectors) of the matrix Cij — x^ij, where x is uniquely defined by 
the requirement that Cij — xcij be a negative semidefinite matrix with at least 
a null eigenvalue. However, this regime is uninteresting because such small 
significance levels are practically useless (if r ~ 10, the corresponding critical 
p- value is a ~ 10~^°). 



In our opinion, the maximization of (52) is not interesting in practice 
because of the dependence on a and of the complicated form of the corre- 
sponding weights. Optimal tests represent a good compromise between high 
power, simplicity and easiness to interpret and compare the results. However, 
it could be possible to build interesting tests with higher power by selecting a 
linear combinations of the weights of the two a-independent tests discussed 
above, that is, optimal tests and tests that maximize the alternative/null 
variance ratio Z]j,fc Cjfc^i^fc/Z]«,m Cim^/^»: 



5. Beyond linear neutrality tests 

5.1. Quadratic and nonlinear tests 

Almost all the neutrality tests proposed in the literature are linear in the 
spectrum ^j. As far as we know, there is only one exception, namely the 



23 



test of Fu [TT]. This test is a quadratic polynomial reminescent of Hotelling's 
statistics for the different components of the spectra: 

n-1 

where c^^ is the inverse of the covariance matrix Cov(,^j,^j). Actually the 
test proposed by Fu is an approximation to this test with a different normal- 
ization, namely 

In this approximation the off diagonal terms in the covariance can be ne- 
glected [9l [11] . For large samples, the distribution of the results of the test 
G tends to a distribution with n — 1 degrees of freedom. 

Fu's approach cannot be extended to general quadratic or higher order 
tests, because the distribution of the results of the test would be generally 
unknown and not positive definite. For this reason we propose to rescale the 
tests to have zero mean and variance 1. With this normalization, we expect 
that the distribution would asymptotically converge to a Gaussian A^(0, 1) 
for all tests. As an example, the (re) normalized version of Fu's test would be 

Ig - . [Ob) 
^Var (eSi ^(^^ - ^Lm^ ' OL^^) - (n - 1)) 

Since the only difference between this test and the original one is the nor- 
malization and a shift by a constant factor n — 1, the power of the test is the 
same. 

Now we present a systematic discussion of nonlinear tests that are generic 
polynomials (or eventually power series) in the spectrum ^j. All the tests are 
rescaled to be centered (i.e., to have zero mean) and have variance 1. We 
denote by ^ijk... the moments of the spectrum under the null model, that is, 
^ijk... = E{^i^jS,k ■ ■ ■)■ With this definition, /ij = 6L^^. Note that all these 
moments depend on 9. In the approximation of unlinked (independent) sites 
and small 6, the second moments are equal to = 6L^^5ij + O^L'^^^C,^. 

The weights of general nonlinear tests can depend explicitly on 6, as seen 
in the previous example. To compute the values of the tests, the (unknown) 



24 



parameter 9 is substituted with an estimator 9. Unlike the hnear case, in 
this case there are two different classes of tests, related to the dependence on 
9 of the centeredness: strongly centered and weakly centered tests. 

Strongly centered tests are tests that are always centered for any value 
of 9, even if it is different from the actual value of 9. The general form for 
strongly centered tests is 

J-n — — I - 

var {y::^ + + E ■ "Li ^S^^oa + ■ ■ ■) 



with the real symmetric weights satisfying the set of conditions 



(56) 



n—l n—1 



= Y.^^f^+Y.^f^ + --- ' ^ = 1,2,3... (57) 

1=1 'i',j=^ 

where we denote by the p-th term of the Taylor expansion with respect 
to 9L of Aijj/c..M The sum can be limited to polynomials of some finite order 
in or it can be a (convergent) power series. If we introduce the notation 
I = ijk . . . to denote a group of ni indices, we can rewrite the test in the 
simpler form 

T„ ^ , (68) 



Var(E,Sii"''«...Oi) 



with the conditions 

I 



= ^n("'V^^ , m = 1,2,3... (59) 



If we constrain these tests to be first order polynomials in ^j, we recover 
the linear case with Q^^^ = Note that linear tests are always strongly 

centered. In fact in the infinite site model the spectrum is always proportional 
to 9, which consequently factorizes out by linearity and therefore has no effect 
on the centeredness. 



^In other words, fiijk... = '^Zp^^ t^^ijk ' where the coefficients ^l^]. are mdependent 
one. 
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Weakly centered tests are tests that are centered but not strongly cen- 
tered, i.e., they are centered if and only if 6 = 6. The general form for weakly 
centered tests is 



var (7 + r«e. + ESi r!f + E^ii rgi^e^e. + ■ ■ ■ 



Tr 



(60) 

with the condition 

n—1 n—1 n—1 

= 7 + E rpV. + E rgV^. + E + ■ ■ ■ (6i) 

i=l *ii=i «,j,fe=i 

where the Fij/c.., are real symmetric weights, possibly dependent on 9. We 
can simplify these expressions using the same notation as above, obtaining 
the simpler form 

^ 7 + E.ri-'K...O, 



with the condition 



Var (7 + EirS"'He...0i 



= 7 + E rS" Vi (63) 



Also for this class of tests, the sum can be limited to polynomials of fixed 
order or extended to power series. Note that the rescaled version of the G 
test by Fu presented above belongs to this class. 

The important difference between strongly and weakly centered tests is 
related to the robustness with respect to a biased estimation of 9. Since 
the class of weakly centered tests is much larger than the class of strongly 
centered ones, it should be easier to find powerful tests in the former class 
than in the latter. However, even if weakly centered tests could be more 
powerful, they would not be centered in scenarios where the value of 9 could 
not be estimated precisely. On the other side, strongly centered tests are 
robust with respect to a bad estimation of 9 and therefore they would be 
preferable in scenarios where an unbiased estimation of 9 is troublesome. 



The scaling rule (15) can be generalized to nonlinear tests in terms of 



functions ^^"'^(/i, /2 . . . /nj for strongly centered and F^"'^(/i, /2 . . . /nj for 
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weakly centered tests: 



^J^qM (64) 
\n n n J 

r("i)^ J_r("i)fl,:^,^...^ (65) 



n"i \n n n 

Fixing the precise scaling is more ambiguous than in the linear case because 
there are many different ways to preserve centeredness. For this reason the 
choice of scaling would be different for strongly and weakly centered tests 
and will not be discussed here. 

All the possible nonlinear neutrality tests based on the frequency spec- 
trum fall into one of the two classes presented in this section and have the 



form (58), (59) or (62), (63). Since both these classes contain an infinite num- 
ber of possible choices of weights, the only reasonable criterion to study gen- 
eral nonlinear tests is to select the most powerful or interesting ones. Apart 
from the Hotelling choice of Fu the most interesting choice is apparently 
the subclass of nonlinear optimal tests, which will be discussed in the next 
sections. 

5.2. Strongly centered optimal tests 

As discussed for the linear case, optimal tests depend on the expected 
alternative scenario. In the nonlinear case, in principle it would be possible 
to find generic optimal tests, but there is no clear framework to obtain them. 
For this reason we limit our study to the case of optimal tests for a specific 
alternative scenario. We denote by fiijk... = S{^i^jC,k ■ ■ •) the moments of the 
alternative spectrum for this scenario. 

Since we use the same normalization for linear and nonlinear tests, the 
optimality condition corresponds to the maximization of the expected value 
of the test under the alternative scenario 

S{Tn) = , (66) 



Var(Eif^S"'^(e.--Oi) 



and can be justified as in the linear case. 

We denote by I the ordered sequence of the indices contained inl = ijk . . . 
and by a (I) the number of distinct permutations of the sequence I, i.e. the 
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total number of permutations divided by the number of permutations that 
leave I invariant. The main result for the optimal weights is presented in this 
theorem. 

Theorem 6. The maxima of S{Tq) correspond to the weights 

ni"-' = ^) f E c^it'fe - E E E c^iivr E k) (67) 

^ ^ V L kit J,K / 

where the matrices C^^ and Aiki satisfy the identities 

^^KJ - /^k/^j) = -^ij (68) 

K 

E-^'^^E^rc'iiV^^=4. (69) 

Moreover, the variance of the corresponding unnormalized test under the null 
model is equal to its expected value under the alternative model: 



var {^ni-\i...O^=Y.nt^^n 



(70) 



Note that in general all the weights of the above optimal solution (67) 
are nonzero, therefore the maximum average value of the test for optimal 
tests built on polynomials of degree d increases with the degree d. This 
suggests that optimal tests of higher degree should be more powerful than 
linear optimal tests. 

We provide explicit formulae for the above weights for the optimal quadratic 
test in the independent sites approximation. Given E{^i) = fii and = 
fti, the relevant weights Q^^^^ are 



«-^(E..2-E,)^^-^J-,^g-^) (71) 



(2) 



1 



/ij/ij E^y y/ij V/^i ^M/ J 



(73) 
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where = ^"j/ /ii and = Yl^=i fii- -^^^ these formulae are also valid 
for the folded spectrum if the appropriate /ij and /2j are used. These results 
are discussed in Appendix |Xj 

For optimal tests of higher degree, explicit expressions become cumber- 



some and the numerical implementation of the test (67) and the matrices 



(68), (69) is more convenient. 



5.3. Weakly centered optimal tests 

In this case the optimality condition corresponds to the maximization of 
the expression 



7 + Eirl"'Vi 



S{Tr 



with the same condition 



Var (7 + Eir{"'He-..e)i) 



(74) 



(75) 



The simplest case corresponds to a first order polynomial 



Tr 



whose maximum corresponds to the optimal weights 

n— 1 n— 1 n— 1 



3=1 k=l 



(76) 



(77) 



where is the inverse matrix of the covariance matrix Cij = jjiij— HiUj. Since 
7 7^ for this optimal test, the value of this test for the specific scenario for 
which it is built is larger than than the value of the corresponding linear 
optimal test. In fact the maximum of the test is 



n— 1 n— 1 



A XlZl ~ S'fc (^fc ~ 

\ J=l k=l 



(78) 
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that should be compared to the maximum of the optimal test for the linear 
case, which can be rewritten as 



linear 



w - M ^jk (/^fc -w — 

j=i k=i 



En— 1 ■^-^n— 1 —1 
7=1 Z^k=l ^^3^ik 



(79) 

The comparison shows clearly that nonlinear optimal tests are always more 
powerful than linear optimal tests for the same scenario. 

The form of the results for the general case is similar to this simple case. 



Theorem 7. The maxima of £{Ty) correspond to the weights 

J J K 

where satisfied the identity 

E^m(/^KJ-/^K/^j) = ^ij (81) 
K 

Moreover, the variance of the corresponding unnormalized test under the null 
model is equal to its expected value under the alternative model: 

Var + E rS"^ (^ • • j = E + 7 (82) 

Also in this case, the power of optimal tests based on polynomials of 
higher degree increases with the degree of the polynomial. 

It is possible to give explicit expressions of the above matrix and moments 
for the optimal quadratic test. The formulae for the weights F^^^ for the 



30 



unfolded spectrum are 



rf^ = (s. + 2-s,)f^-iVU§-i) (83) 



Hi J 2 

r,?' = Jf^-iV (84) 



2 \n 

r!? = if^-iU^-i) (85) 



7 = ^(S^-S^)(S^ + 2-S^) (86) 

These results are valid in the independent sites approximation. They are 
also valid for the folded spectrum if the appropriate and /ij are used. An 
expression for the denominator of the test in the independent sites approxi- 
mation can be found in Appendix |Aj 

5.4- Simulations of the power of optimal tests 

Since a theoretical evaluation of the power of optimal tests of different 
degree is not possible, we evaluate numerically the power of some of these 
tests in different scenarios. We consider the best possible case, that is, we 
assume that the precise value of 6 is known. Moreover we assume unlinked 
sites and ^ <^ 1. In this approximation, as shown in Appendix |A| the mo- 
ments E{C,iC,jC,k • • •) depend only on the first moments /ij = OL^f and similarly 
S{^i^j^k ■ ■ •) depend only on /ij = 9L^i, therefore optimal tests depend only 
on the alternative and null average spectra. 

Note that for numerical simulations of optimal tests of higher degree, 
the numerical implementation can be made easier if all the occurrences of 
inverse covariance matrices C^^ in the the above formulae are replaced with 



the corresponding second moments fi-^, both in the expressions (67), (80) 



and in the definition (69). The test is the same because of the cenferedness 
condition, as it can be verified explicitly. 

We compare four optimal tests. The first two are the linear and quadratic 
strongly centered optimal tests, which are denoted by T^^^^-^ and T^^^^-^ respec- 
tively. The third test is the weakly centered optimal test T^^^-^ based on a 
first order polynomial and presented in (76). The last optimal test T^^'g) is 



also weakly centered and based on on a quadratic polynomial. The explicit 
formulae for the computation of the weights of T'^^^g) -^0(2) '^^re given in 



equations (71)- (73) and (83)- (86). 
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Figure 3: Statistical power of nonlinear optimal tests from coalescent simulations for the 
5% tail, compared with Tajima's D test (for the left and the right tail). The parameters 
for the simulations are: n = 20, 9 = 0.05, L = lOOObp, p = oo; two populations considered 
but only one sampled (for panel A); expansion factor Nq/N = 10 (for panel B). 

We simulated two demographic processes: (A) subdivision, where two 
populations having identical size exchange individuals given a symmetric 
migration rate M, then individuals are sampled from one population only; 
(B) expansion, where the population size changes by a factor Nq/N = 10 at 
a time T before present (in units of 4N generations). For each value of the 
parameters M and T, 10^ simulations were performed with mlcoalsim vl.98b 
[20] for a region of 1000 bases with variability 9 = 0.05 and recombination 
p = oo and a sample size of n = 20 (haploid) individuals. Confidence 
intervals at 95% level were estimated from 10^ simulations of the standard 
neutral coalescent with the same parameters. 



In Figure [3] we compare the power of the tests in the best possible sit- 
uation, namely when 6 is known with good precision. In this condition all 
optimal tests should give the best results. In fact, the power of weakly cen- 
tered tests (Tq^^^ and Tq^^)) is impressive, being around 100% for a large 
part of the parameter space and decreasing for large migration rates (Figure 
|3]A.) and long times (Figure [3^3) as every other test, because the frequency 
spectrum for these cases becomes very similar to the standard spectrum. So, 
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weakly centered tests show a very good theoretical performance, counterbal- 
anced by their lack of robustness. The power of Tq^^^^^ and Tq^^^) almost 
identical, therefore the contribution of the quadratic part to Tq^^^^ is probably 
not relevant. 

On the other hand, strongly centered optimal tests are quite more power- 
ful than Tajima's D but less powerful than weakly centered tests, as expected. 
However, there is a sensible difference in power between ^^'(i) -^0^(2)- 
the range of parameters where the power of weakly centered tests is around 
100%, both strongly centered tests show a good performance not so far from 
the weakly centered ones, while in the less favourable range the quadratic test 
-^o'(2)' while performing worse than the weakly centered tests, has a power 
that is 20% higher than the linear test ^o'(i)- Taking into account the robust- 
ness of the tests, these simulations show that optimal tests like Tq^2) could 
be an interesting alternative to the usual linear tests. 



6. Conclusions 

In this paper we have presented a systematic analysis of neutrality tests 
based on the site frequency spectrum. This study is intended to extend and 
complete the recent works in [S] and [TU] by extending the study of the linear 
neutrality tests recently presented by Achaz, their properties and the optimal 
tests that can be obtained in this framework; in a further generalization, we 
also consider the most general class of tests that can be written as a power 
series of the different components of the spectrum. The aim of the paper 
is to give mathematical guidelines to build new and more effective tests. The 



proposed guidelines are the scaling relation (15) and the optimality condition 
based on the maximization of S{Tu). Both these guidelines are thoroughly 
explained and discussed. 

While nonlinear optimal tests have been shown to be more powerful than 
linear ones (and weakly centered tests more powerful than strongly centered 
ones), power is not the only important issue: also robustness must be taken 
into consideration. In fact there are three important remarks on the rela- 
tive robustness of these tests. The first one is that, as already discussed, 
centeredness of weakly centered tests is not robust with respect to a biased 
estimate of 6, therefore these tests should be preferred to strongly centered 
tests only in situations where the value of 6 is well known or a good estimate 
is available. 
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The second remark is that neither the weights nor the results of hnear 
optimal tests do depend on the value of 9 and on the number of segregating 
sites S*, while the weights of nonlinear optimal tests have an explicit depen- 
dence on 6 and their results depend not only on the spectrum but also on S, 
therefore the interpretation of the results of these tests is more complicated. 
However, this is not necessarily true for homogeneous tests of any degree, like 
the quadratic test by Fu. An interesting development of this work could 
be a study of homogeneous tests of a given degree k satisfying the optimality 



condition, which can be easily obtained from equations (67) by restricting all 
ordered sequences of indices I, J, K, L to contain precisely k indices (along 
with some "traceless" condition, in case). These homogeneous optimal tests 
(or at least some subclass of them) should depend weakly on S. 

The third remark is that linear optimal tests have two interesting prop- 
erties that are not shared by other tests: they depend only on the deviations 
from the null spectrum and they have an easy interpretation in terms of these 
deviations, that is, they are positive if the observed deviations are similar to 
the expected ones and negative if the observed deviations are opposite to the 
expected ones. These features give an important advantage to linear optimal 
tests. 

Tests based on the frequency spectrum of polymorphic sites are fast, 
being based on simple matrix multiplications, and can be therefore applied 
to genome-wide data. Moreover, they can be used as summary statistics for 
Approximate Bayesian Computation or other statistical approaches to the 
analysis of sequence data. While linear tests are often used in this way, the 
nonlinear tests presented in this paper contain more information (related to 
the covariances and higher moments of the frequency spectrum) that could 
increase the power of these analyses. 
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A. Moments of the frequency spectrum in the independent sites 
approximat ion 

We consider the limit ^ <^ 1, L — > oo with constant 9L. The spectrum 
can be written as a sum of spectra for all sites 



Ci = J2Us) (87) 



s=l 

where each variable ^i{s) has a Bernoulli distribution ^j(s) G {0,1} with 
probabilities = 6*^° and p(0) = 1 — where ^° = 1/i under the 
standard neutral model. The expectation value of is therefore 

^ or 

and similarly £{ii) = fii = OL^i for a general model with average spectrum 

In the independent sites approximation, which is equivalent to the infinite 
recombination limit, the variables ^i{s) and ^i{s') arc i.i.d. random variables 
for s 7^ s', and more generally the random variables ^,i{s) and ^j{s') are 
independent for s ^ s'. The moments for a single site s can be calculated as 

E{UsMs)Us) ■ ■ ■) - E abc.PiUs) = a,^j{s) = b,Us) = c, . . .) 

a,b,c.. .e{0,l} 

= P{Us) = =l,Us) = 1, . . .) = PiU^) = ^)S^JSjk ■ ■ ■ (89) 

because the sum Y17=i ^^i^) ^ {O'^}' ^^^^ different allele frequencies for 
the same site are mutually exclusive. Therefore all these moments are always 
linear in 6 and are nonzero only when all indices are equal: 

Ems)Cj{s)Us) . . .) = eC%5jk . . . = %Sjk ... . (90) 
Then the second moment can be evaluated as 

s,s' s s^s' 

^L5,, - + L(L - 1) ^ = 5ijiii + iiiiij - ^ (91) 

I IJ L 
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and neglecting subleading orders in ^ or L ^ like the last term above, we can 
calculate the third and forth moments that are needed for the calculation of 

c-\ 

The final result for the moments of the spectrum is 

= l^ij ^SijUi + HiHj (92) 
E{CiCjCk) = fJ-ijk =SijSjkHi + {Sikf^if^j + Sjkfiifij + SijUiHk) + iiiiijiik (93) 

— fJ'ijkl —^ij^jk^klfJ'i + {^ik^jll^il^j + ^il^jkf^if^j + ^ij^klf^if^k)~'r 

+{Sij5jkiiilii + 5ij5jiiiiiik + Sik5kiiJ'ii^j + 5jk5kiiiiiij)+ 
+{5iiiJ,iiijiik + SjiHiiijUk + SikHiUjHi + SjkiJ,iHjHi+ 
+5ijHiHkHi + SkiHiUjHk) + l^il^jlJ-kl^i (94) 

All the results above can be applied to a general model simply by sub- 
stituting /ij with /ij. Moreover they can be applied to the folded spectrum 
by taking = 9Ln/i{n — + 6n,2i) for the standard neutral model or 
fii = 6L{li + In-i)/ (1 + 5n,2i) for general models. 

We define some quantities in order to simplify the expressions for the 
weights: 

n— 1 n— 1 n— 1 _2 

1=1 1=1 1=1 

If the spectrum is folded, all the sums in the above expressions run from 1 
to [n/2\. 

The covariance matrix Cj j is 

Cij = fiij - fiifij = Sijfii (96) 

Cij,k = fJ'ijk — fJ'ijfJ'k = Sij6jk^i + (SikHifij + SjkHiflj) (97) 
Cij^kl = f^ijkl — fJ'ijfJ'kl = SijSjk^klfJ'i + {SikSjifIifJ,j + SiiSjkfiif^j) + 

+ iSijSjkHiiJ,i + Sij5jiHiHk + SikSkifXi^j + Sjk5kiidtiJ,j)+ 

+ (SiiHiiijiik + djiHiUjHk + SikHiUjHi + SjkiJ,iiJ,jiJ,i) (98) 

with the elements Cij^k and Cij^ki that should be considered only for i < j, 



36 



k < I. It can be verified that the inverse matrix C- ^ has the form 

I, J 



2fii (S^ + 3) + 1 



Qifc —^ij^jk ^ o i^ik + Sjk) 



'^HM -{^ikSjl + SiiSjk) 



7:0ij0ik0ji — 



(99) 
(100) 
(101) 



The formulae for the weakly centered quadratic test (83 86) can be ob 



tained from these formulae and the definition (80). The corresponding vari 



ance in the denominator ( 62 ) is 



Var (^X:rS'^'^(^--Oij 



2(S^ - S,/2)2 + S^( V2 + 1 - 2Sp + S,) 
+S, - 2Ep (102) 



The matrix Ai is the inverse of the matrix A^,,/ = fi'^^C-^jj^^ 
which can be easily calculated from the above equations as 



= -T? 



^J, ' 

2 



(103) 
(104) 
(105) 



and the matrix Ai is 



M 



2/S^ 
2/S^ 4/S^ + 2/S2 



(106) 



The formulae (|71||73|) can be obtained from the formula (|106|) and from the 

(107) 
(108) 



following results: 



+ (S^ + 2 - 



1 //ii 

2 Ui 



i 



/Ui 2 V/Wj 



(109) 
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;iio) 







= -S 






= 1/2 




Eq;>f 


= 1 



(111) 

(112) 
(113) 



B. Proofs 

Proof of theorem [3l Choose a vector Aj in ]R"~^ that is orthogonal both 
to iQi and to i, that is, such that ^ . zAji7j = and ifij = 0. Since 
a/ian + (1 — a)Aj is a set of continuous functions of a, its minimum is 
also continuous in a. Moreover, the minimum is clearly positive if a = 1, 
while it is negative by construction if a = 0. The theorem follows from the 
intermediate value theorem. ■ 

Proof of theorem |4l The proof is similar to the previous one. Choose 
a function A(/) in C^iO,!) to satisfy both J^^^df fn{f)A{f) = and 

Jyj^df fA{f) = 0. (Since these conditions correspond just to two inde- 
pendent functionals of A(/) and C°°(0,1) is an infinite-dimensional linear 
space, the existence of such a function is guaranteed.) Since a//log(A^) + 
(1 — a)A(/) is a continuous functions of a and its infimum is not ±oo, its 
infimum is also continuous in a. Moreover, the infimum is clearly positive 
if a = 1, while it is negative by construction if a = 0. The theorem follows 
from the intermediate value theorem. ■ 

Proof of theorem [5j The vectors fij are a basis of the subspace M""^ c 
M"^^ defined by the condition J2i = 0; that is, the space of vectors orthog- 
onal to the vector whose components are Vi = 1. Therefore the only vectors 
iAj that are orthogonal to all the vectors in this basis are precisely of the 
form iAi oc Vi, that is, Aj = const /i. ■ 

The theorems on the form of optimal tests can be easily proved from a 
general lemma. 
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Lemma 1. Consider a function f : R \ {0} — > R 0/ the form 

m = (114) 

where w G and Q is a M x M symmetric positive matrix, and a K x M 
matrix R with K < M and maximum rank. The extrema of the function f 
restricted to the subspace Rv — are given by 

Va^a (q-^w - Q-^R^ {RQ-^R^^ RQ'^wj (115) 

The extrema with a > are maxima and the extrema with a < are minima 
of the function f. These extrema satisfy the identity 

Va ■ QVa = OiVa ' W (116) 

Proof. The existence of maxima and minima can be proved by the Weier- 
strass extreme value theorem. In fact / is continuous and invariant under a 
homothety with center in the origin of R^ and positive scale factor, therefore 
the codomain of the function on the linear subspace defined by Rv = is the 
same as the codomain of its restriction to the submanifold of unit vectors 
\v\ = 1, which is a compact space. The restriction of / is also continuous and 
the conclusion follows. To determine the extrema, the method of Lagrange 
multipliers states that it is sufficient to extremize the function 

F{v,X)^^^ + X.Rv (117) 
y/v ■ Qv 

and since there are no boundaries, this is equivalent to the solution of the 
equations 

— * — * — * 

= V,/ ^ R'^ (118) 

y/W^ {v-Qvf^ 
= Va/ ^Rv (119) 

The solution satisfies 

-v^Q-^w + Q-^R'X (120) 



V ■ Qv 
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and multiplying it by R and using ( |119 ) we obtain 

X = -[RQ-^R'y^ RQ-^w + l , R'l = 



(121) 



that can be inserted again in equation (120) to eliminate A. The resulting 



equation in v admits only solutions of the 



brm (115) and by substituting 



( 115 ) into it, it can be checked that all values of a 7^ correspond to solutions 



of (118), (119). The invariance of / under a central homothety with positive 



scale factor implies that the value of the function does not depend on \a\. 
The function is positive for a > and negative for a < 0, therefore solutions 
with a > correspond to maxima and solutions with a < correspond to 



minima. The identity (116) can be proved by substituting the solution (115). 



Proof of theorem [6l The expected values of the tests (58 ) have the same 



functional form as the function / of Lemma [T] The correspondence is the 
following: 

V ^vi = (x(l)Q^p^ (122) 

- wi = fLf (123) 

^ Qij = f^ij - (124) 



w 

Q 

R 



-> R 



kJ 



(fc) 



(125) 



and the positivity of the matrix Q is guaranteed by the positivity of the 
variance for all possible choices of the weights. Application of Lemma [T] with 
aa = 1 gives the result (67). ■ 



Proof of theorem [71 We can immediately solve equation (75) for 7 and 



substitute it in equation (74). Then 7 is a function of the other weights and 



the maximization is unconstrained. It can be seen that also in this case, the 
expected values of the tests have the same functional form as the function / 
of Lemma [Tj The correspondence is the following: 

iJ ^v. = a(i)T^p'^ (126) 

(127) 
(128) 
(129) 



R — )■ empty x M matrix 



and the positivity of the matrix Q is implied by by the positivity of the 
variance. Then the result (80) follows from Lemma [l] with a = 1. ■ 



40 



References 

[1] M. Kreitman, Nucleotide polymorphism at the alcohol dehydrogenase 
locus of Drosophila melanogaster, Nature 304 (5925) (1983) 412-417. 

[2] R. Hudson, M. Kreitman, M. Aguade, A test of neutral molecular evo- 
lution based on nucleotide data. Genetics 116 (1) (1987) 153. 

[3] R. Lewontin, J. Krakauer, Distribution of gene frequency as a test of 
the theory of the selective neutrality of polymorphisms. Genetics 74 (1) 
(1973) 175. 

[4] P. Sabeti, D. Reich, J. Higgins, H. Lcvinc, D. Richter, S. Schaffner, 
S. Gabriel, J. Platko, N. Patterson, G. McDonald, et al.. Detecting 
recent positive selection in the human genome from haplotype structure. 
Nature 419 (6909) (2002) 832-837. 

[5] F. Tajima, Statistical method for testing the neutral mutation hypoth- 
esis by DNA polymorphism. Genetics 123 (3) (1989) 585. 

[6] Y. Fu, W. Li, Statistical tests of neutrality of mutations. Genetics 133 (3) 
(1993) 693. 

[7] J. Fay, C. Wu, Hitchhiking under positive Darwinian selection. Genetics 
155 (3) (2000) 1405. 

[8] G. Achaz, Frequency Spectrum Neutrality Tests: One for All and All 
for One, Genetics 183 (1) (2009) 249. 

[9] Y. Fu, Statistical properties of segregating sites. Theoretical Population 
Biology 48 (2) (1995) 172-197. 

[10] L. Ferretti, M. Perez-Enciso, S. Ramos-Onsins, Optimal Neutrality Tests 
Based on Frequency Spectrum, Genetics (2010) genetics. 110. 118570. 

[11] Y. Fu, New statistical tests of neutrahty for DNA samples from a pop- 
ulation. Genetics 143 (1) (1996) 557. 

[12] S. Schaeffer, Molecular population genetics of sequence length diver- 
sity in the Adh region of Drosophila pseudoobscura. Genetics Research 
80 (03) (2003) 163-175. 



41 



[13] K. Schmid, S. Ramos-Onsins, H. Ringys-Beckstein, B. Weisshaar, 
T. Mitchell-Olds, A multilocus sequence survey in Arabidopsis thaliana 
reveals a genome-wide departure from a neutral model of DNA sequence 
polymorphism, Genetics 169 (3) (2005) 1601. 

[14] S. Hutter, H. Li, S. Beisswanger, D. De Lorenzo, W. Stephan, Distinctly 
different sex ratios in African and European populations of Drosophila 
melanogaster inferred from chromosomewide single nucleotide polymor- 
phism data, Genetics 177 (1) (2007) 469. 

[15] Y. Fu, Statistical tests of neutrality of mutations against population 
growth, hitchhiking and background selection, Genetics 147 (2) (1997) 
915. 

[16] K. Zeng, Y. Fu, S. Shi, C. Wu, Statistical tests for detecting positive 
selection by utilizing high-frequency variants. Genetics 174 (3) (2006) 
1431. 

[17] F. Tajima, Evolutionary relationship of DNA sequences in finite popu- 
lations. Genetics 105 (2) (1983) 437. 

[18] G. Watterson, On the number of segregating sites in genetical models 
without recombination.. Theoretical population biology 7 (2) (1975) 256. 

[19] G. Achaz, Testing for neutrality in samples with sequencing errors. Ge- 
netics 179 (3) (2008) 1409. 

[20] S. E. Ramos-Onsins, T. Mitchell-Olds, Mlcoalsim: multilocus coalescent 
simulations., Evol Bioinform Online 3 (2007) 41-44. 



42 



